Implications for cisgender female underrepresentation, small sample sizes, and misgendering in sport and exercise science research

A sex-data gap, from testing primarily males, results in a lack of scientific knowledge for other groups (females, transgender individuals). It is unknown whether typical recruitment and participant characterization causes incorrect statistical decisions, and three factors were evaluated: 1) underrepresenting cisgender females, 2) recruiting small sample sizes, 3) misgendering. Data from the National Health and Nutrition Examination Survey (2003–2004) were evaluated for sex differences after removing missing values (N = 3,645; F = 1,763). Disparities were determined by utilizing sample sizes common in sport and exercise science research; mean sample size N = 187, median sample size N = 20. Participants were randomly allocated into datasets in an imbalanced manner (33.5% females, 66.5% males). Potential effects of misgendering were determined at rates of 2% and 5%. Differences between the complete data set and expected decisions were conducted through Chi-squared (χ2) goodness of fit with significance at p < .05. When the entire dataset was evaluated as if a sex testing disparity was present, decisions were not altered (χ2 = .52, p = .47). Differences were observed for mean sample size (χ2 = 4.89, p = .027), median sample size (χ2 = 13.52, p < .001), and misgendering at 2% (χ2 = 13.52, p = < .001) and 5% (χ2 = 13.52, p = < .001). Recruitment practices in sport and exercise science research should be revisited, as testing primarily cisgender males has consequences, particularly in small sample sizes. Misgendering participants also has consequences on ultimate decisions and interpretations of data, regardless of sample size. Inclusiveness is needed in helping all individuals feel valued and respected when participating in sport and exercise science research.


Introduction
Deficiencies in scientific knowledge that occur due to unequal recruiting and testing practices between males and females (typically cisgender in the literature, although not explicitly stated) are known as the "sex-data gap" [1,2].Issues highlighting the sex-data gap in sport and exercise science were identified by Costello et al., who reported a disparity (39% of participants were females, 61% of participants were males) in three leading journals [3].A follow-up by Cowley et al. doubled the number of journals and found that little changed in testing disparities during the ensuing six-year period (34% of participants were females, 66% of participants were males) [1].The most recent investigation into the sex-data gap by Garver et al. reported similar tendencies in a student-focused journal (36% of participants were females, 64% of participants were males) [2].The consequences of continued underrepresentation are impactful, as females experience sport and exercise and associated recovery differently than males [1] and lack appropriate scientific data to make informed decisions about adjusting important training-based variables [3].There is a clear need to be inclusive when carrying out high-quality investigations [2].
The literature contains ample evidence of sex differences between females and males, particularly when body composition measures are evaluated [4].For example, in a survey of healthy Han adults from Shaanxi Province, China, significant differences were noted for body mass (females = 58 ± 9 kilograms [kg], males = 74 ± 11 kg, p < .001)but not body mass index (BMI) (females = 23 ± 3 kg•meter −2 (kg•m −2 ), males = 25 ± 3 kg•m −2 , p = .706)[5].Significant sex differences for BMI, however, have been reported between middle-aged African Americans (females = 31 ± 5 kg•m −2 , males = 29 ± 5 kg•m −2 , p < .05)[6].Differences in subscapular and triceps skinfold measures were noted in healthy young Japanese adults (subscapular: females = 19 ± 5 millimeters [mm], males = 12 ± 4 mm; triceps: females = 20 ± 5 mm, males = 10 ± 5 mm) [7].Waist girth has been shown to be significantly lower in females from New Zealand in the early-(70 ± 9 centimeters [cm]), late-(74 ± 9 cm), and post-pubertal stage (79 ± 14 cm) than males (early-= 72 ± 10 cm, late-= 79 ± 8 cm, post-pubertal = 84 ± 8 cm, all comparisons p < .001)[8].Thus, based on the available literature, there is strong evidence of body composition differences between females and males.It is unknown whether typical sport and exercise science recruiting and sample size testing may ultimately affect outcomes and decisions.Considering the sex-data gap and frequency of binary participant characterization (cisgender female, cisgender male), the field of sport and exercise science may be facing an invisible issue.It is unknown whether the typical approaches to recruitment and participant characterization have caused incorrect statistical decisions about differences in body composition among genders.Three factors may be at play: underrepresenting cisgender females, recruiting small sample sizes, and misgendering.
While sample sizes can influence significance testing and subsequent interpretation, such decisions require understanding and nuance [9].Marsh et al. investigated 30 well-known models of confirmatory factor analysis from the Self Description Questionnaire and reported that sample size substantially affected all but one index [10].Adequate sample size is an issue that has plagued sport and exercise science research for the past half century [11].The issue persists, as an evaluation of 120 randomly selected manuscripts published in the Journal of Sports Sciences found a median sample size of 19 [12], and a 30-year evaluation of 676 investigations published in the Journal of Applied Biomechanics reported the majority (71%) utilized 2-20 participants [13].A recent investigation of 806 studies on human subjects published in the International Journal of Exercise Science (IJES) over a 14-year period had a median sample size of 20 participants [2].Other fields are concerned with the sampling of rare species, which naturally have a small number of occurrences [14].However, it cannot be argued that female participants in sport and exercise science have a similar distribution.To our knowledge, the intersection of recruiting traditionally small sample sizes with a disparity of female participants has not been evaluated in sport and exercise science research.
Another area of concern in sport and exercise science research is appropriately representing individuals who do not identify into a gender-normative binary definition of sex.Garver et al. reported that, of 151,043 participants evaluated in the IJES self-study, one participant identified as transgender, three identified as other, and one declined to identify their gender [2].The current estimation is that as much as 2% [15] to 5% [16] of the U.S. population identifies as transgender or nonbinary.Thus, between 3,031 and 7,552 participants in the IJES self-study could have been classified as the incorrect gender, or as a gender other than cisgender female or cisgender male.Based on the available literature, it seems reasonable to conclude that many sport and exercise science researchers do not have a mechanism established to evaluate the gender constitution of their recruited samples by a method other than partitioning into female and male (implying cisgender female and cisgender male).It also seems reasonable to conclude that many participants included in sport and exercise science research may be misgendered.When a person is described in a language that does not coincide with their gender identity, misgendering occurs [17].In healthcare, the consequences of misgendering include individuals not utilizing healthcare [18], delays in seeking and obtaining medical care leading to increased emergency treatment [19], and receiving incompetent care or being denied treatment [20].To our knowledge, the consequences of misgendering in sport and exercise science research has not been evaluated.
Several questions remain to be addressed in the sport and exercise science literature: 1) the effect of the consistent recruiting imbalance between females and males, 2) the impact of small sample sizes, and 3) the consequences of misgendering.The purpose of this investigation was to determine how statistical decisions may change when considering the effect of these three questions.Body composition measures were utilized because studies have reported significant differences between people characterized according to the gender-normative binary definition of sex.It was hypothesized that recruiting an imbalanced ratio of females to males, utilizing small sample sizes, and misgendering individuals would all negatively affect decisions regarding between-sex differences in body composition measures.

Participants
Fully anonymized data for this study were obtained from public use datasets of the National Health and Nutrition Examination Survey (NHANES) released from 2003 to 2004 (National Center for Health Statistics Research Ethics Review Board approval protocol #98-12).During the written informed consent process, survey participants were guaranteed that collected data would be used only for the stated purpose and would not be disclosed or released in accordance with section 308(d) of the Public Health Service Act (42 U.S.C. 242m).Conducted by the Centers for Disease Control, the NHANES survey collects representative information from the United States population in the form of both survey and physical examination measures relating to health and nutritional status.Details of the survey and laboratory procedures are available elsewhere [21,22].
The initial dataset contained information from 9,041 participants.After participants aged 0-17 were removed, 4,965 participants aged 18-85 years (y) old remained (n = 2,587 females, n = 2,378 males).Body composition data extracted for the purpose of this analysis included the dependent variables of BMI, subscapular skinfold, triceps skinfold, waist girth, and body mass.Cases were removed (n = 496 males, n = 824 females) if any dependent variable was missing, resulting a final dataset of 3,645 participants (see Table 1).

Protocol
Our initial analysis was to test for sex differences for each dependent variable in the overall sample, as well as by age category (18-29 y, 30-39 y, 40-49 y, 50-59 y, 60-69 y, 70-85 y).This overall sample served as the baseline for differences (p < 0.05) and effect size (small, medium, large) decisions.
We then tested to determine whether an imbalance in recruiting fewer females (33.5% females versus 66.5% males), as has been consistently shown in the sport and exercise science literature [1][2][3], would affect decisions that were made at baseline.To do this, the entire male sample was retained, and two-thirds of the female sample was randomly removed using the random function in Microsoft Excel (Version 16.72, Microsoft Corporation, Redmond, WA).The analysis was conducted overall as well as by age category.
Next, the disparity in recruiting and testing was determined by utilizing sample sizes commonly employed in the sport and exercise science literature.According to Garver et al. [2] the mean sample size was 187 participants (referred to as "Large Ratio" moving forward) and the median sample size was 20 participants (subsequently referred to as "Small Ratio").Participants were randomly allocated into the Large Ratio and Small Ratio datasets in an imbalanced manner (33.5% females, 66.5% males) using the random function in Microsoft Excel.The analyses for Large Ratio and Small Ratio were conducted overall as well as by age category.
Finally, the potential effects of misgendering in the sample were determined.These effects were determined at a rate of 2% [15], and at a rate of 5% [16] in both Large Ratio and Small Ratio.As was done for the random allocation disparity described above, misgendered participants were randomly determined.For example, transgender females who were classified as males, were allocated into the dataset as females.Similarly, transgender males who were classified as females, were allocated into the dataset as males.The analyses for Large Ratio with misgendering occurring at 2% and 5% and Small Ratio at 5% were conducted overall as well as by age category.Because sport and exercise science investigations recruit sexes in an unequal ratio, the number of transgender people randomly included in these datasets followed the same distribution (i.e., 66.5% were added as transgender females, and 33.5% were added as transgender males).Because the Small Ratio with misgendering occurring at 2% represented less than a whole number of misgendered participants, the consequences of misgendering a single participant were determined, both in the case of a misgendered male and in the case of a misgendered female.These analyses were conducted overall as well as by age category.

Statistical analysis
Differences in the binary sex comparison between females and males was conducted through a one-sided independent t-test for each dependent variable (IBM SPSS Statistics, Version 28.0.1.0,IBM Corp., Armonk, NY).Significance was accepted at the p � 0.05 level.Effect sizes were determined through Cohen's d, with small = 0.00-0.49,medium = 0.50-0.79,and large � 0.80 [23].
Differences between the complete data set and expected decisions for subsequent iterations were conducted through Chi-squared (χ 2 ) goodness of fit analysis in Microsoft Excel.It was expected that less than 5% of the decisions would be affected (significant or not significant at the p < 0.05 level, and effect size classification changes as small, medium, or large).Effect size Phi (φ) interpretation was trivial < 0.1, small = 0.1-0.29,medium = 0.3-0.49,and large � 0.5 [23].

Baseline
The initial comparisons for body composition measures between all females and males included from the NHANES data are shown in Table 2. Significant differences were observed for all measures in the overall group.When age groups for this dataset were considered, significant differences were noted for all variables for 18-29 y.No significant difference in BMI was

Effect of sex disparity
When the entire dataset was evaluated as if a testing disparity was present for sex, decisions were not altered for the overall group.While changes in decisions were noted for two age groups (BMI in 18-29 y and 40-49 y) the impact was not significant (see Table 3 and S1A Table ).When a large dataset (N = 187) was evaluated as if a testing disparity was present for sex, a significant difference compared to baseline was observed for the overall randomly drawn sample (see S1B Table ).While decision changes were noted for one variable in specific age groups (subscapular skinfold: 18-29 y, 40-49 y; BMI: 30-39 y; body mass: 60-69 y; waist girth: 70+ y) the impact was not significant.
When a small dataset (N = 20) was evaluated as if a testing disparity was present for sex, a significant difference compared to baseline was observed for the overall randomly drawn sample (see S1C Table).Significant differences in decisions and interpretation were also noted for every age category with the exception of the 60-69 y group.

Effect of misgendering
When a large dataset (N = 187) was evaluated as if 2% of the sample were misgendered, the results are shown in Table 4.It is important to note that differences were not compared to baseline, but to the large dataset that included sex disparities (i.e., S1B Table ).A significant difference was observed for the overall randomly drawn sample.While decision changes were noted for at least one variable (generally waist girth) in all age groups but one (40-49 y), it was only significant in the 50-59 y group.
When a large dataset (N = 187) was evaluated as if 5% of the sample were misgendered, the results are shown in Table 5. Differences were not compared to baseline, but to the large dataset that included sex disparities (i.e., S1B Table ).A significant difference was observed for the overall randomly drawn sample.Significant differences were also noted for each age group except 50-59 y, and 60-69 y.Variables most commonly affected were waist girth and body mass.
When a small dataset (N = 20) was evaluated as if a single female was misgendered, the results are shown in Table 6.It is noted that differences were not compared to baseline, but to the small dataset that included sex disparities (i.e., S1C Table ).No significant difference was observed for the overall randomly drawn sample.Significant differences were observed for four age groups (18-29 y, 30-39 y, 60-69 y, and 70+ y).
When a small dataset (N = 20) was evaluated as if a single male was misgendered, the results are shown in Table 7. Differences were not compared to baseline, but to the small dataset that included sex disparities (i.e., S1C Table ).A significant difference was observed for the overall randomly drawn sample.Significant differences were observed for four age groups (18-29 y, 30-39 y, 40-49 y, and 70+ y).

Discussion
The purpose of this investigation was to utilize a large NHANES dataset to determine how statistical decisions might be altered when considering the effect of female recruiting disparities, small sample sizes, and the effect of misgendering on body composition measures.It was hypothesized that recruiting an imbalanced ratio of females to males, utilizing small sample sizes, and misgendering individuals would affect the ultimate interpretation of the data.Utilizing a small sample size comparable to what is employed in many sport and exercise science investigations [2,12] affected both decisions (t-test results) and interpretations (effect size) to a greater extent than larger sample sizes.Furthermore, the potential effect of a sex recruiting disparity was amplified in a small sample size.Lastly, the effect of potential misgendering in sport and exercise science research is impactful regardless of the size of the sample recruited.
A number of authors have reported the consistent use of small sample sizes in sport and exercise science-related research [12,13,24].To our knowledge, no investigations have systematically evaluated the intersection of sample size and sex recruiting disparity in sport and Values are means (standard deviations).Cohen's d interpreted as small = 0.00-0.49,medium = 0.50-0.79,and large � 0.80 [23].Effect size φ interpreted as trivial <0.1, small = 0.1-0.29,medium = 0.30-49, and large � 0.5 [23].Gray indicates a different decision or interpretation than initially made at baseline.NHANES: National Health and Nutrition Examination Survey; χ 2 : chi-squared; BMI: body mass index in kilograms(kg)•meter −2 ; mm: millimeters; cm: centimeters.

Sample
https://doi.org/10.1371/journal.pone.0291526.t007 exercise science.Based on the results of the current analysis, the practice of recruiting primarily cisgender males does not appear to be impacted in large and robust sample sizes when considering body composition measures.However, when utilizing a small sample size and cisgender female ratio common in many sport and exercise science studies, the present data provide evidence that different interpretations occur for the overall sample, as well as almost every age category.While the consequences for sport and exercise science and related disciplines have yet to be determined, our data indicate different decisions may be made for the seemingly straightforward measures of body composition.Not confined to sport and exercise science, the clinical diagnosis and prevention of disease was derived almost exclusively from research on male cell lines, animals, and men [25].Gender is also largely overlooked in technology and engineering [26].A concerted effort to address this issue from a public health perspective has emerged, as the National Institutes of Health have required researchers to account for sex as a biological variable in funded preclinical research since 2016 [27,28].In sport and exercise science, the consequences of persistent underrepresentation are impactful, as cisgender females experience sport and exercise and associated recovery differently than males [1] and suffer from the dearth of appropriate scientific data to make informed decisions about modifying training-based variables [3].The results of the current investigation raise the very real possibility of false positives in exercise-related disciplines (i.e., test result indicating the presence of a sex-based difference when there really is not one present), as well as the possibility of false negatives (i.e., test result indicating no sex-based difference when there really is one present) due to chronically small sample recruitment skewed against the inclusion of cisgender females.
To further determine the lack of inclusion in exercise-related research, we observed the effect of potential misgendering on the binary sex differences and interpretations associated with body composition metrics.We acknowledge the political hazard this issue has recently become [29] but do not feel ignoring individuals who identify as transgender or nonbinary is an appropriate course.The reporting of gender minorities appears to be the exception, as shown in the IJES self-study where, of over 151,000 participants, a single person was reported as transgender and three as other [2].While likely not intentional, it is possible that many researchers are simply not allowing the possibility that a recruited individual in an exercise intervention could be transgender or nonbinary.This is problematic because the population presents with unique health disparities [30] that could become exacerbated by the wave of anti-transgender legislation [29].Because these statutes imply that transgender people are not accepted where people live, work, and play [31], sport and exercise science researchers are urged to utilize inclusive methods for collecting gender data.Several resources exist in other fields that can be adapted for use in sport and exercise science investigations [32][33][34][35][36].
The findings of the current investigation provide evidence that misgendering affects decisions surrounding body composition metrics regardless of the sample size employed.While the current study employed conservative estimates for the percentage of transgender and nonbinary individuals [15,16], these values are likely underestimated, given the stigma that gender minorities face [37].It is understood that some investigators may be hesitant to include the data for a person they perceive to be in a misaligned category (i.e., a person assigned male at birth who identifies as a female into the female category, or a person assigned female at birth who identifies as a male into the male category).The effects of doing so will likely increase the variability of the dataset.The current findings show that, at least for measures associated with body composition, this variability will change ultimate decisions and interpretations of the data.However, variable data is inherent in sport and exercise-related disciplines [38][39][40].Rather than treating this data as unwanted noise to be reduced, it is encouraged that sport and exercise scientists become comfortable with the variability in our data.This investigation is not without limitations.The participants for each data set were collated from a random draw of the overall population which may not reflect samples utilized in sport and exercise science research, so the findings should be interpreted with caution.It is acknowledged that, particularly in smaller samples, the influence exerted by a single individual or groups of individuals could significantly change results and successive interpretation.Toward this end, a greater volume of simulations could be conducted to provide a probability to which the influence is expected to extend.Another limitation is that data from the current investigation was confined exclusively to body composition variables.Future studies of this nature could extend the scope to encompass a wider array of health-related variables.
In conclusion, these findings provide evidence that select practices in sport and exercise science testing and research should be revisited.First, the tendency of recruiting cisgender females in disparate numbers compared to cisgender males has consequences regarding ultimate decisions and interpretations, particularly in the small sample sizes that dominate the sport and exercise-related disciplines.A concerted effort should be made toward more equitable representation.Second, sport and exercise scientists should reevaluate the methods by which gender data are collected.Our findings provide evidence that misgendering has significant consequences on ultimate decisions and interpretations of data, regardless of the sample size employed.A relatively innocuous first step is providing more than traditional binary options when collecting the gender identity of participants.Inclusiveness is needed in helping all individuals feel valued and respected when participating in sport and exercise sciencerelated research.Moreover, inclusive research will promote more equitable access to healthrelated data that is useful to all people, not only cisgender males.

Table 4 . Female and male NHANES body composition measures using a large dataset (N = 187) as if 2% of the sample were misgendered.
Expected results compared to sex disparities in a large dataset were conducted using χ 2 .

Table 5 . Female and male NHANES body composition measures using a large dataset (N = 187) as if 5% of the sample were misgendered.
Expected results compared to sex disparities in a large dataset were conducted using χ 2 .